Prior work has shown that Visual Recognition datasets frequently underrepresent bias groups $B$ (\eg Female) within class labels $Y$ (\eg Programmers). This dataset bias can lead to models that learn spurious correlations between class labels and bias groups such as age, gender, or race. Most recent methods that address this problem require significant architectural changes or additional loss functions requiring more hyper-parameter tuning. Alternatively, data sampling baselines from the class imbalance literature (\eg Undersampling, Upweighting), which can often be implemented in a single line of code and often have no hyperparameters, offer a cheaper and more efficient solution. However, these methods suffer from significant shortcomings. For example, Undersampling drops a significant part of the input distribution while Oversampling repeats samples, causing overfitting. To address these shortcomings, we introduce a new class conditioned sampling method: Bias Mimicking. The method is based on the observation that if a class $c$ bias distribution, \ie $P_D(B|Y=c)$ is mimicked across every $c^{\prime}\neq c$, then $Y$ and $B$ are statistically independent. Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution without repeating samples. Consequently, Bias Mimicking improves underrepresented groups average accuracy of sampling methods by 3\% over four benchmarks while maintaining and sometimes improving performance over non sampling methods. Code can be found in https://github.com/mqraitem/Bias-Mimicking
translated by 谷歌翻译
短语检测需要方法来标识短语是否与图像相关,然后如果适用,则本地化。培训更多歧视性短语检测模型的关键挑战是采样硬质否定。这是因为少数短语被注释了可能适用的几乎无限的变化。为了解决这个问题,我们介绍了PFP-net,一个短语检测器,通过两种新方法区分短语。首先,我们将相关对象的短语组合成粗俗的视觉相干概念(例如动物VS汽车),然后培训我们的PFP-网以根据他们的概念成员来区分它们。其次,对于包含细粒般的互相令牌(例如颜色)的短语,我们强制模型只为每个区域选择一个适用的短语。我们在Flickr30k实体和Refcoco +数据集中评估我们的方法,在那里我们在这场具有挑战性任务的所有短语上通过1-1.5点改进地图。在考虑只考虑受我们细粒度推理模块影响的短语时,我们在两个数据集中都会在1-4分。
translated by 谷歌翻译